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The increasing advancement of technologies and communication 
infrastructures has been posing threats to the internet services. One of the 
most powerful attack weapons for disrupting web-based services is the 
distributed denial of service (DDoS) attack. The sophisticated nature of 
attack tools being created and used for launching attacks on target systems 
makes it difficult to distinguish between normal and attack traffic. 
Consequently, there is a need to detect application layer DDoS attacks from 
network traffic efficiently. This paper proposes a detection system coined 
eXtreme gradient boosting (XGB-DDoS) using a tree-based ensemble model 
known as XGBoost to detect application layer DDoS attacks. The Canadian 
institute for cybersecurity intrusion detection systems (CIC IDS) 2017 
dataset consisting of both benign and malicious attacks was used in training 
and testing of the proposed model. The performance results of the proposed 
model indicate that the accuracy rate, recall, precision rate, and Fl-score of 
XGB-DDoS are 0.999, 0.997, 0.995, and 0.996, respectively, as against 
those of k-nearest neighbor (KNN), support vector machine (SVM), 


XGBoost principal component analysis (PCA) hybridized with XGBoost, and KNN 
with SVM. So, the XGB-DDoS detection model did better than the models 
that were chosen. This shows that it is good at finding application layer 
DDoS attacks. 
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1. INTRODUCTION 

It is no longer news that our daily activities are significantly dependent on the internet, backed by 
significant advancements in technological and communication infrastructures [1]. The general advancement 
in technologies and developments in network infrastructures in particular have attracted many users to these 
technologies, including malicious users who pose threats to computer networks and cyber systems, leading to 
an increased incidence of cyber-attacks [1]. Prominent among these cyber-attacks is the distributed denial of 
service (DDoS) attack, which has been observed to be a significant weapon of cyber-attacks in recent times. 
DDoS is essentially used to overload network resources such as memory and bandwidth, lowering computer 
network performance [2]. It is regarded as a hazardous attack on network security [2], [3]. A DDoS attack is 
a category of attack involving multiple or groups of devices attacking an underlying server to exhaust the 
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network resources and legitimate users are denied access to network services. DDoS attacks are designed 
primarily to deprive legitimate users of resources and render network services unavailable to them [3], [4]. 
The major targets of attackers are the transport and network layers, but as technology advances, the 
application layer is now susceptible to DDoS attacks. DDoS attacks on application layer crash and exhaust 
the network server's resources, make the service unavailable to a legitimate host connected to the server, or 
infect the connected host through the exploitation of application layer protocols [5], [6]. Slowloris and 
hypertext transfer protocol (HTTP) flooding are examples of this application layer attack. An HTTP flood 
attacks web servers by utilizing HTTP GET or POST requests [6]. Slowloris attacks the web server by 
sending a partial HTTP request to the targeted server, causing the targeted server to open additional 
connections [6]. Figure | depicts the typical architecture of a DDoS attack where the attacker indirectly 
accesses the agent through handlers; the attacker could access many possible agents and handlers needed for 
launching a DDoS attack. The agent is responsible for sending many useless packets to the target victim 
simultaneously where the network resources are exhausted and the service availability is shut down [7]-{9]. 


Attacker ` 


5 


4 


T Attack Traffic 
Victim Host/Network 


Figure 1. Typical DDoS architecture [7], [10] 


Application DDoS attacks are web-based activities, and when they occur, all application services are 
put on hold [10], [11]. As these attacks increase drastically with sophisticated tools [12], [13] web application 
layer needs more security attention. Identifying an application layer DDoS attack is more difficult because it 
uses a fake address known as a spoofed IP to launch its attacks, thereby making a trace to the source of the 
attack uneasy [14]-[16]. Consequently, there is a need for efficient security of web services from DDoS 
attacks to accurately classify the network traffic as normal or attack traffic. Many approaches have been 
adopted for classifying DDoS attacks on network and web systems. Many of these approaches include 
statistical methods that are usually referred to as traditional methods. These traditional methods have been 
found insufficient for detecting DDoS attacks; hence, machine learning techniques are now being adopted. 
Consequently, machine learning-based DDoS detection has received increased attention in recent times. 
Some of the recent machine learning-based papers found in [17]—[21], were reviewed and described as 
follows. Alhayali et al. [17] a number of machine learning optimizing algorithms were combined for feature 
selection and weighting and feature subset selection (FSS). Furthermore, multi-objective optimization was 
adopted to choose the fewest characteristics without compromising FSS accuracy. The suggested Rao- 
algorithm-specific, support vector machine (SVM) parameter-less idea was also examined in the work while 
KDDCup 99 and Canadian institute for cybersecurity intrusion detection systems (CICIDS) 2017 datasets 
were employed. On the KDDCup 99 dataset, rao-SVM was more accurate than other methods by 100%, and 
on the CICIDS dataset, it was more accurate by 97.5%. 
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In another related work, a machine learning-dependent approach to detect DoS attacks was proposed 
in a client-server environment. A dataset of traffic data and DoS attacks were fed into their proposed 
algorithm for training. The trained algorithm was able to identify DoS attacks from other network traffic 
packets. Consequently, a very high percentage of right classifications was achieved by [18]. In order to 
address the problems of intrusion and cyber-attacks in network systems with single classifiers, [19] presented 
a majority voting-based ensemble model capable of being used in real-time to successfully examine network 
traffic and preemptively inform against possible attacks of some popular DoS assaults. The proposed model 
was found very effective for detecting intrusions on network systems [19]. Rajagopal et al. [20] used the 
stacking idea to provide an ensemble solution for network intrusion detection. The suggested method was 
implemented using Graphlab to present a powerful processing paradigm that handled large amounts of data. 
To show the reliability of predictions, two benchmark datasets-UNSW NB-15 and UGR '16-were used to 
evaluate and validate the prediction outputs. The experiment results showed that the performance of the 
proposed technique was excellent [20]. The need for real-time detection of aberrations of network packets 
through the adoption of machine learning classifiers to learn and classify compromised ones from 
uncompromised ones between their source and destination was tested [21]. The proposed detection model 
was evaluated and found applicable for real-time detection, especially when sensitive information is 
involved. 

In the reviewed paper, intrusion detection using either optimized or basic binary classification 
machine learning was mostly adopted. In these, intrusion detection was either applied to a general network 
system or a DoS, but this paper focuses on detecting DDoS attacks on the application layer. This paper is 
intended to develop a model for detecting application layer DDoS attacks using a tree-based ensemble model 
known as eXtreme gradient boosting (XGBoost) and XGBoost combined with principal component analysis 
(PCA); and compare their performances with selected machine learning models [22]. The XGBoost 
algorithm has been regarded as an outstanding performer for solving classification problems due to its 
scalability in all scenarios, speed, and accuracy [23]. This paper adopts XGBoost with and without the PCA 
for detecting application layer DDoS attacks [24]. 


2. METHOD 

This section discusses the proposed model, coined XGB-DDoS, and the architecture for detecting 
application DDoS attacks. A tree-based model called XGBoost is proposed to classify malicious and non- 
malicious traffic efficiently. Figure 2 depicts the proposed detection architecture where legitimate and 
illegitimate user requests are sent to the web server through the internet. The XGB-DDoS inspects requests to 
verify if they are malicious or not. If the traffic is malicious, the XGB-DDoS detects the attack before it 
enters the server. 


Attacker 


Figure 2. XGB-DDoS architecture 


2.1. eXtreme gradient boosting 

XGBoost is a widely used supervised machine learning algorithm for classification and regression. 
Because of its scalability, data scientists widely use it to solve many machine learning challenges and real- 
world scale problems. It is a new version of the gradient boosting decision tree (GBDT) that supports 


Efficient model for detecting application layer distributed denial of service ... (Morenikeji Kabirat Kareem) 


44 a ISSN: 2302-9285 


distributed and parallel computation, making the model's training faster and having higher execution time and 
cache optimization. The main advantages of the XGBoost algorithm are speed, scalability, and high- 
performance using gradient-boosted decision trees [25]. 

It is approximately ten times faster than the previous methods on a single platform, eradicating time 
consumption, particularly during preprocessing. It outshines other libraries by allowing tuning of 
regularization parameters, and it was recorded as a winning model published on Kaggle's blog in 2015 with 
17 solutions [26]. The XGBoost addresses overfitting as well as other classification-related issues. It predicts 
by combining weak base learners to form stronger learners, as depicted by (1). The XGBoost model is trained 
in a way that adds up [26], so that (1) gives the prediction. 


2i = fY) = Eka fowo (1) 


Let f;(yı) denote a base learner function, M the number of base learners, and 2; represents 
prediction at the i-th sample. XGBoost emerged to solve the problem of overfitting and some other 
classification task-related problems [26]. As shown in (2), the regularized objective function must be 
minimized to learn the set of functions used in the model. The objective function is to monitor the 
performance of the model during training. 


obj(Ø) = [TL(Ø) + 2) J (2) 


TL(Ø) = 1(z;,2;), this is the training loss, while Q(@) is the regularization for penalizing the 
complexity of the model to avoid overfitting (3). 


L(O) = Dik, Uzi, 2i + G(x) + AG) (3) 


XGBoost uses a Newton boosting method [26], requiring the loss function to be twice differentiable 
to optimize the objective function quickly. In training the model, the optimal weight of the leaf is calculated 
for a fixed structure of the tree and enumerated for all possible trees. Then, pick the best one that optimizes 
the objective function, i.e., splits the leaf into two leaves on the new left and right leaf, and see the score it 
gains. Predictors are made dependently and sequentially, and the subsequent predictors learn from the 
mistakes of the previous predictor. 


2.2. Principal component analysis 

PCA is known as an unsupervised learning and dimension-reduction technique. When there is a 
large dataset, it is usually difficult to interpret. PCA is used to reduce the dimension of the dataset with a high 
number of features while preserving as much relevant information as possible and minimizing information 
loss [27]. PCA is a data exploratory analysis method that reduces data dimensionality. It enables less data 
storage space, noise reduction, and collinearity removal. The original data of possibly correlated variables is 
transformed into linear uncorrelated (PCA) data set values [27]. 


2.3. Dataset 

In this work, a well-labeled Canadian dataset, the CIC IDS 2017 dataset [28]. It includes both 
benign and malicious attacks used in training and testing the proposed system. The data was captured and 
recorded on Wednesday, July 5", 2017. The dataset consists of the following advanced types of attacks (DOS 
Slowloris, DOS Hulk, DOS Slowhttptest, and DOS GoldenEye): some of the primary application-layer 
DDoS attacks. These datasets contain 80 extracted features with a total number of 692,703 flows. Table 1 
shows the types of attacks present in the dataset used in this work. Table 2 depicts the total number of rows in 
the dataset. 


Table 1. CIC IDS dataset analysis 
S/n Traffic type Number 


1 Benign 440,031 

2 DoS Slowloris 5,796 

3 DoS Hulk 231,073 

4 DoS Slowhttptest 5,499 

5 DoS Goldeneye 10,293 

6 Heartbleed 11 
Total 692,703 
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Table 2. Data set used for the experiment 
Category Training data set __ Testing dataset 

Benign 426,822 13,209 

Dos attacks 176,870 75,802 


2.4. Dataset pre-processing 

As depicted in Figures 3 and 4, the dataset was first preprocessed to fit in for training the model. In 
the dataset, all the rows with not a number (NAN) were dropped. The data set has a multi-class label, and it is 
essential to transform it into a numerical value that the machine understands. A label binarizer was adopted 
for data transformation. A label a binarizer is a technique in machine learning for converting multi-class to 
binary labels. XGBoost can be trained without standardizing the dataset since it is a tree-based model, but 
data standardization is essential when combined with PCA. Data standardization is a method of rescaling 
features to a mean value of 0 and a standard deviation of 1. In PCA, features with high variance usually have 
priority over features with low variance. A Standard Scalar technique was used on the data set for rescaling 
to prevent inadequacy. The rescaled data is fed into the PCA algorithm for dimensionality reduction, which 
helps to abolish redundant features from the data set. In training and testing the model, the dimensionally 
reduced data was fed into the XGB classifier. Figure 3 depicts the steps involved in the XGB-DDoS system; 
Figure 4 depicts the hybridized system using the XGBoost and PCA. Algorithm 1 explains how the XGB- 
DDoS model detects DDoS attacks and non-attacks. 


A Preprocessing phase 
Training 
Dataset z z 
[set Label Binarizer 


XGBClassifier 


Figure 3. XGB-DDoS process 
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Figure 4. XGB-PCA process 
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Algorithm 1. XGB-DDoS model 
INPUT: XGB-PCA, Data set sample, Dsampte 
OUTPUT: DDoS attack, non-attack 


PROCESS: 

1: flag — XGB—DDOS (Deampie) 
2: ifflag = = non_DDos_attack then 
3: forward request 

4: else if flag == DDoS attack then 
6: Request Access denied 

7: end if 

8: end 


3. RESULTS AND DISCUSSION 

The experiment was carried out on an Intel Core i5 computer with 8G RAM and a terabyte disk. The 
model was developed using the Python sci-kit-learn library with jupyterlab. The CIC IDS dataset is well- 
labeled, where 70% of the dataset was adopted for training and 30% for testing the model. The experimental 
information provided by the confusion matrix as represented in Table 3 was used to calculate the accuracy, 
precision, recall, and F1 scores, used to evaluate the models. 
- Accuracy: this is referred to as the rate at which the system correctly distinguish between the DDoS attack 

and non- attack that is percentage correctly classified, it is estimated with (4). 


Accuracy = (TP +TN)/(TP+TN+FP+FN) (4) 


- Precision: this is the correct rate of DDoS attack detected by the system; a better system should possess a 
higher precision rate and is estimated with (5). 


ay. TP 
Precision = —— (5) 
TP+FP 


- Recall or sensitivity: this rate that shows the proportion of True Positive rate was classified by the 
detection model as DDoS attack, is estimated with (6). 


Recall or Sensitivity = TP/ (TP + FN) (6) 


- FI score: this measure of the detection model accuracy it is the harmonic mean of precision and recall, is 
estimated with (7). 


F1 score = 2x (Precision x Recall)/ (Precision Recall) (7) 


Table 3. Confusion matrix for the XGB-DDoS 


Predicted label Confusion matrix 

Benign 32,004 15 14 11 14 0 
Goldeneye 16 3,061 0 1 1 0 
Hulk 122 19 69,155 0 0 0 
Slowhttptest 9 1 0 1,619 0 0 
Slowloris 0 0 0 2 1,746 0 
Heartbleed 0 0 0 0 0 1 


The result of the proposed model was compared with SVM and k-nearest neighbour (KNN) in terms 
of accuracy, recall, Fl-Score and precision. The performance results of each algorithm with PCA were 
recorded. XGBoost without the PCA is coined XGB-DDoS, while XGBoost with the PCA is XGB-PCA. 
From the experimental results, XGB-PCA and XGB-DDoS have an accuracy rate of 0.997 and 0.999, 
respectively. Table 4 shows the overall accuracy performance of the selected algorithm. KNN, SVM, KNN- 
PCA, and SVM-PCA have the following accuracy rates: 0.990, 0.994, 0.999, and 0.995, respectively. 

This result shows that the XGBoost algorithm alone is sufficient and has a high-performance rate; 
without the inclusion of PCA, it can handle missing data and multi-class labels. Table 4 shows the overall 
accuracy performance of the selected algorithms. KNN, SVM, KNN-PCA, and SVM-PCA have the 
following accuracy rates: 0.990, 0.992, 0.994, and 0.994, respectively. The accuracy performance increased 
with the addition of PCA, while there was no improvement in SVM. Nonetheless, the accuracy metric is not 
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enough to measure the machine learning model's performance, and other performance metrics are displayed 
with the bar charts. Figure 5 shows the performance rate of XGB-PCA and XGB-DDosS in terms of recall, 
precision, and fl-score. From the experimental results, PCA has no significant contribution to the XGBoost 
algorithm. Without PCA, XGBoost outperformed XGB-PCA; as a result, XGBoost is an excellent classifier. 


Table 4. Accuracy performance between algorithms 
Algorithms Accuracy rate 


XGB-DDoS 0.999 
XGB-PCA 0.997 
KNN 0.990 
KNN-PCA 0.999 
SVM 0.994 
SVM-PCA 0.995 
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Figure 5. Performance comparison between XGB-DDoS and XGB-PCA 


The performance of XGB-PCA and XGB-DDoS in terms of recall, precision, and Fl-score rate is: 
0.996, 0.992, 0.994, and 0.997, 0.995, 0.996, respectively. Figure 6 shows the performance of KNN and 
KNN with PCA. The recall, precision, and Fl-score rate of KNN and KNN with PCA (KNN-PCA) are: 
0.984, 0.916, 0.944, and 0.993, 0.995, 0.979, respectively. From the result, the impact of PCA is obvious, and 
it improved the performance of KNN; hence, KNN-PCA outperformed ordinary KNN. Figure 7 is the bar 
chart showing the performance results of SVM and SVM with PCA (SVM-PCA). There is a conspicuous 
performance difference between SVM and SVM-PCA in terms of recall and Fl-score, but a slight difference 
of 0.001 for precision. In the chart, SVM is shown to have a rate of 0.920, 0.960, and 0.935 for recall, 
precision, and fl-score, respectively. While the rate of recall, precision and recall for SVM with PCA are 
0.951, 0.961, and 0.979 respectively. From the analysis of the experiment, it can be seen that PCA has a big 
effect and greatly improves the performance of traditional machine learning algorithms. However, the 
performance of the XGB-DDoS does not improve in a noticeable way. 
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Figure 6. Performance comparison between KNN and KNN-PCA 
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Figure 7. Performance comparison between SVM and SVM-PCA 


4. CONCLUSION 

The DDoS attack, a variant of the DoS attack, is no doubt one of the commonest and biggest 
security challenges to networks and computing systems in recent times. Although a number of recent 
research efforts have been geared towards addressing the problem of DoS attacks with simple machine 
learning techniques, intrusion detection using ensemble and majority voting approaches, and intrusion 
predictive models on general network and web service systems, none has presented a model for detecting a 
DDoS attack at the application layer. This paper developed efficient detection models for detecting 
application layer DDoS attacks. A standard Canadian dataset was employed to train and test the developed 
model, called XGB-DdoS. The experimental results show that the XGB-DDoS detection model outperformed 
other selected traditional machine learning models used in some recent DDoS detection research efforts. 
Hence, XGB-DDosS is efficient for detecting application DDoS attacks. Future directions of this work would 
be to incorporate a mechanism capable of blocking the attack after it has been detected. 
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